Latent reasoning

Instead forcing LLMs to output tokens, we can potentially keep the embedding and let the LLMs iterate on those embeddings (see also Embedding). It may be similar to how people think without uttering their thoughts. This way of reasoning may allow LLMs to reason more freely and in a nuanced manner without “collapsing” their thoughts into a specific meaning (token).

Hao2024training (“Chain of continuous thoughts” - CoCoNut)
Geiping2025scaling